library(tidyverse)
library(janitor)
library(gt)
library(readxl)

Reading in the data

In this R Markdown file, the Excel file that is read in is called analytic_data.xlxs. The data frame is called EXAMPLE_DATA. Replace these with the names of the files you wish to use.

EXAMPLE_DATA <- read_excel("analytic_data.xlsx")
EXAMPLE_DATA <- EXAMPLE_DATA %>% 
  mutate_if(is.character,as.factor)

In all of the code below, you will need to replace EXAMPLE_DATA with the name of your data frame. You will need to use the appropriate variable names.

One numerical variable

EXAMPLE_DATA %>%
  summarise(Mean = mean(NUMERICAL_VARIABLE1),
            "Standard Deviation" = sd(NUMERICAL_VARIABLE1),
            n = n()) %>%
  gt() %>%
  fmt_number(c(Mean, "Standard Deviation"),
             decimals = 1) %>%
  tab_spanner(
    label = "Nice label for NUMERICAL_VARIABLE1",
    columns = c(1:3))
Nice label for NUMERICAL_VARIABLE1
Mean Standard Deviation n
4.3 2.3 20

One numerical variable and one categorical variable

Mean, SD and n

EXAMPLE_DATA %>%
  group_by(CATEGORICAL_VARIABLE1) %>%
  summarise(Mean = mean(NUMERICAL_VARIABLE1),
            "Standard Deviation" = sd(NUMERICAL_VARIABLE1),
            n = n()) %>%
  gt() %>%
  fmt_number(c(Mean, "Standard Deviation"),
             decimals = 1)
CATEGORICAL_VARIABLE1 Mean Standard Deviation n
A 4.0 1.4 10
B 4.7 2.9 10

Five number summary, including median and quartiles

EXAMPLE_DATA %>%
  group_by(CATEGORICAL_VARIABLE1) %>%
  summarise(n = n(),
            Minimum = min(NUMERICAL_VARIABLE1),
            "First quartile" = quantile(NUMERICAL_VARIABLE1, 0.25),
            Median = median(NUMERICAL_VARIABLE1),
            "Third quartile" = quantile(NUMERICAL_VARIABLE1, 0.75),
            Maximum = max(NUMERICAL_VARIABLE1))%>%
  gt() %>%
  fmt_number(c(Minimum, "First quartile", Median, "Third quartile", Maximum),
             decimals = 1)
CATEGORICAL_VARIABLE1 n Minimum First quartile Median Third quartile Maximum
A 10 2.5 2.8 3.8 4.6 7.0
B 10 1.5 2.7 4.1 5.8 10.3

One categorical variable

EXAMPLE_DATA %>%
  tabyl(CATEGORICAL_VARIABLE1) %>%
  gt() %>%
  cols_label(percent = "proportion")
CATEGORICAL_VARIABLE1 n proportion
A 10 0.5
B 10 0.5

There are additional adorn functions that can be used to improve the gt tables further.

EXAMPLE_DATA %>%
  tabyl(CATEGORICAL_VARIABLE1) %>%
  adorn_totals("row") %>%
  adorn_pct_formatting() %>%
  gt()
CATEGORICAL_VARIABLE1 n percent
A 10 50.0%
B 10 50.0%
Total 20 100.0%

two categorical variables

EXAMPLE_DATA %>%
  tabyl(CATEGORICAL_VARIABLE1, CATEGORICAL_VARIABLE2) %>%
  gt()
CATEGORICAL_VARIABLE1 N Y
A 2 8
B 7 3

An improvement adds the label of the second categorical variable so it is clear what the table is representing.

EXAMPLE_DATA %>%
  tabyl(CATEGORICAL_VARIABLE1, CATEGORICAL_VARIABLE2) %>%
  gt() %>%
  tab_spanner(
    label = "CATEGORICAL_VARIABLE2",
    columns = c(N,Y))
CATEGORICAL_VARIABLE1 CATEGORICAL_VARIABLE2
N Y
A 2 8
B 7 3

You can then use adorn to add in percentages.

EXAMPLE_DATA %>%
  tabyl(CATEGORICAL_VARIABLE1, CATEGORICAL_VARIABLE2) %>%
  adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_ns(position = "front") %>%
  gt() %>%
  tab_spanner(
    label = "CATEGORICAL_VARIABLE2",
    columns = c(N,Y))
CATEGORICAL_VARIABLE1 CATEGORICAL_VARIABLE2
N Y
A 2 (20%) 8 (80%)
B 7 (70%) 3 (30%)

© Statistical Consulting Centre, University of Melbourne, 2023